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FIG, 2 
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LI$T=$tQrtList 



PICK THE nRST URL ON THE UST 
I 



DELETE THIS URL X FROM UST 



RETRIEVE THE URL X FROM THE WORLD 
WIDE WEB BY USING HHP OR FTP 



I 



PARSE THE RETRIEVED DOCUMENT X AND 
nND THE SET OF OUTLINKS ON X 

\ 



ADD THESE URL OUTLINKS TO LIST 
I 



260 H UPDATE AGGREGATE STATISTICAL INFORMATION | 
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ADD THE DOC UMENT X TO THE OUTPUT LIST 



UPDATE THE PREDICATE-SPECIFIC 
STAHSTICAL INFORMAHON 



RECALCULATE PRIORITIES FOR ALL URLs ON 
THE CANDIDATE LIST 

\ 



RE-ORDER THE CANDIDATE LIST BASED 
ON THESE PRIORITIES 
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REPORT THE OUTPUT LIST 
340 > ^ END ) 
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FIG, 3 



( START )^350 



UPDATE THE AGGREGATE WORD COUNTS WFH 
THE WORD COUNTS OF THE DOCUMENT X 



TOKENIZE THE URL FOR DOCUMENT X --370 



UPDATE THE AGGREGATE URL TOKEN COUNTS 
WITH THE TOKENS IN DOCUMENT X 
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( stop > ^390 



FIG, 4 



( start V-400 



UPDATE THE PREDICATE-SPECinC WORD 
COUNTS WITH THE WORD COUNTS OF 1^410 
THE DOCUMENT X 



TOKENIZE THE URL FOR DOCUMENT X ^ 420 



UPDATE THE PREDICATE-SPECIFIC URL 
TOKEN COUNTS WITH THE TOKENS IN \^ 430 
DOCUMENT X 



( STOP > -440 
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FIG. 5A 



CSTART 



500 



RND THE SET OF WORDS W WHICH HAVE HIGHER 
REPRESENTAHON IN PREDICATE-SPECIHC STATISTICS 
THAN IN THE AGGREGATE STATISTICS. 



I 



LET q BE THE PERCENTAGE OF WORDS IN W 
CONTAINED IN ALL DOCUMENTS ALREADY CRAWLED 
WHICH POINT (LINK) TO Y 



I 



FIND THE SET OF URL TOKENS U WHICH HAVE 
HIGHER REPRESENTATION IN PREDICATE-SPECIFIC 
STAHSnCS THAN IN THE AGGREGATE STATISHCS 



T 



LET r BE THE PERCENTAGE OF TOKENS IN U 
CONTAINED IN THE TOKENIZED REPRESENTAHON 
OF THE URL STRING FOR Y 



I 



RETURN q+r 
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FIG, 5B 



570 ^ START ) 
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FOR EACH CANDIDATE URL ON URL UST DO LOOP 
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CALCUUTE PRIORITY VALUE FOR THIS URL Y 
USING PROCEDURE OF RG. 5A 




586 ^ STOP ) 
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FIG. 6 

( START V- 600 



REMOVE THE STRING hltp:// ^cin 
FROM THE URL STRING ' 



I 



REPLACE EACH OCCURANCE OF . ... 
AND 7' IN THE STRING BY " * " 



T 



NOW REMOVE TOKENS FROM THE 

STRING USING " (BLANK) \^ 630 
AS SEPARATOR 



RETURN CORRESPONDING TOKENS 
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C STOP ) ^650 



